Overview

Dataset statistics

Number of variables22
Number of observations85855
Missing cells101139
Missing cells (%)5.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory14.4 MiB
Average record size in memory176.0 B

Variable types

Categorical13
Numeric9

Alerts

imdb_title_id has a high cardinality: 85855 distinct values High cardinality
title has a high cardinality: 82094 distinct values High cardinality
original_title has a high cardinality: 80852 distinct values High cardinality
year has a high cardinality: 113 distinct values High cardinality
date_published has a high cardinality: 22012 distinct values High cardinality
genre has a high cardinality: 1257 distinct values High cardinality
country has a high cardinality: 4907 distinct values High cardinality
language has a high cardinality: 4377 distinct values High cardinality
director has a high cardinality: 34733 distinct values High cardinality
writer has a high cardinality: 66859 distinct values High cardinality
production_company has a high cardinality: 32050 distinct values High cardinality
actors has a high cardinality: 85729 distinct values High cardinality
description has a high cardinality: 83611 distinct values High cardinality
avg_vote is highly correlated with metascoreHigh correlation
votes is highly correlated with usa_gross_income and 2 other fieldsHigh correlation
usa_gross_income is highly correlated with votes and 3 other fieldsHigh correlation
worlwide_gross_income is highly correlated with usa_gross_income and 2 other fieldsHigh correlation
metascore is highly correlated with avg_voteHigh correlation
reviews_from_users is highly correlated with votes and 3 other fieldsHigh correlation
reviews_from_critics is highly correlated with votes and 3 other fieldsHigh correlation
writer has 1572 (1.8%) missing values Missing
production_company has 4455 (5.2%) missing values Missing
description has 2115 (2.5%) missing values Missing
metascore has 72550 (84.5%) missing values Missing
reviews_from_users has 7597 (8.8%) missing values Missing
reviews_from_critics has 11797 (13.7%) missing values Missing
budget is highly skewed (γ1 = 176.1867109) Skewed
imdb_title_id is uniformly distributed Uniform
title is uniformly distributed Uniform
original_title is uniformly distributed Uniform
writer is uniformly distributed Uniform
actors is uniformly distributed Uniform
description is uniformly distributed Uniform
imdb_title_id has unique values Unique
budget has 62179 (72.4%) zeros Zeros
usa_gross_income has 70529 (82.1%) zeros Zeros
worlwide_gross_income has 54839 (63.9%) zeros Zeros

Reproduction

Analysis started2022-10-04 13:34:33.347704
Analysis finished2022-10-04 13:35:30.392595
Duration57.04 seconds
Software versionpandas-profiling v3.3.1
Download configurationconfig.json

Variables

imdb_title_id
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct85855
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size670.9 KiB
tt0000009
 
1
tt1347008
 
1
tt1347006
 
1
tt1346973
 
1
tt1346961
 
1
Other values (85850)
85850 

Length

Max length10
Median length9
Mean length9.011659193
Min length9

Characters and Unicode

Total characters773696
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique85855 ?
Unique (%)100.0%

Sample

1st rowtt0000009
2nd rowtt0000574
3rd rowtt0001892
4th rowtt0002101
5th rowtt0002130

Common Values

ValueCountFrequency (%)
tt00000091
 
< 0.1%
tt13470081
 
< 0.1%
tt13470061
 
< 0.1%
tt13469731
 
< 0.1%
tt13469611
 
< 0.1%
tt13468501
 
< 0.1%
tt13466291
 
< 0.1%
tt13463021
 
< 0.1%
tt13462811
 
< 0.1%
tt13459041
 
< 0.1%
Other values (85845)85845
> 99.9%

Length

2022-10-04T10:35:30.513104image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tt00000091
 
< 0.1%
tt00036371
 
< 0.1%
tt00018921
 
< 0.1%
tt00021011
 
< 0.1%
tt00021301
 
< 0.1%
tt00021991
 
< 0.1%
tt00024231
 
< 0.1%
tt00024451
 
< 0.1%
tt00024521
 
< 0.1%
tt00024611
 
< 0.1%
Other values (85845)85845
> 99.9%

Most occurring characters

ValueCountFrequency (%)
t171710
22.2%
0126521
16.4%
166918
 
8.6%
259104
 
7.6%
455377
 
7.2%
352321
 
6.8%
651167
 
6.6%
850723
 
6.6%
547360
 
6.1%
746862
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number601986
77.8%
Lowercase Letter171710
 
22.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0126521
21.0%
166918
11.1%
259104
9.8%
455377
9.2%
352321
8.7%
651167
8.5%
850723
8.4%
547360
 
7.9%
746862
 
7.8%
945633
 
7.6%
Lowercase Letter
ValueCountFrequency (%)
t171710
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common601986
77.8%
Latin171710
 
22.2%

Most frequent character per script

Common
ValueCountFrequency (%)
0126521
21.0%
166918
11.1%
259104
9.8%
455377
9.2%
352321
8.7%
651167
8.5%
850723
8.4%
547360
 
7.9%
746862
 
7.8%
945633
 
7.6%
Latin
ValueCountFrequency (%)
t171710
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII773696
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t171710
22.2%
0126521
16.4%
166918
 
8.6%
259104
 
7.6%
455377
 
7.2%
352321
 
6.8%
651167
 
6.6%
850723
 
6.6%
547360
 
6.1%
746862
 
6.1%

title
Categorical

HIGH CARDINALITY
UNIFORM

Distinct82094
Distinct (%)95.6%
Missing0
Missing (%)0.0%
Memory size670.9 KiB
Anna
 
10
Darling
 
8
Wanted
 
7
Vendetta
 
7
Lucky
 
7
Other values (82089)
85816 

Length

Max length196
Median length84
Mean length16.9734203
Min length1

Characters and Unicode

Total characters1457253
Distinct characters153
Distinct categories16 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique79153 ?
Unique (%)92.2%

Sample

1st rowMiss Jerry
2nd rowThe Story of the Kelly Gang
3rd rowDen sorte drøm
4th rowCleopatra
5th rowL'Inferno

Common Values

ValueCountFrequency (%)
Anna10
 
< 0.1%
Darling8
 
< 0.1%
Wanted7
 
< 0.1%
Vendetta7
 
< 0.1%
Lucky7
 
< 0.1%
I miserabili7
 
< 0.1%
Solo7
 
< 0.1%
Maya7
 
< 0.1%
Aurora7
 
< 0.1%
Alone7
 
< 0.1%
Other values (82084)85781
99.9%

Length

2022-10-04T10:35:30.684619image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the7894
 
3.1%
la5049
 
2.0%
il4206
 
1.6%
4017
 
1.6%
di3611
 
1.4%
of2435
 
1.0%
a2428
 
0.9%
del1849
 
0.7%
in1807
 
0.7%
i1781
 
0.7%
Other values (61622)220536
86.3%

Most occurring characters

ValueCountFrequency (%)
169758
 
11.6%
a131288
 
9.0%
e128227
 
8.8%
i100977
 
6.9%
o94336
 
6.5%
n81397
 
5.6%
r75148
 
5.2%
t63652
 
4.4%
l63426
 
4.4%
s54820
 
3.8%
Other values (143)494224
33.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1102498
75.7%
Space Separator169758
 
11.6%
Uppercase Letter154170
 
10.6%
Other Punctuation17654
 
1.2%
Decimal Number6852
 
0.5%
Dash Punctuation5736
 
0.4%
Close Punctuation219
 
< 0.1%
Open Punctuation217
 
< 0.1%
Math Symbol55
 
< 0.1%
Other Letter28
 
< 0.1%
Other values (6)66
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a131288
11.9%
e128227
11.6%
i100977
 
9.2%
o94336
 
8.6%
n81397
 
7.4%
r75148
 
6.8%
t63652
 
5.8%
l63426
 
5.8%
s54820
 
5.0%
d40255
 
3.7%
Other values (47)268972
24.4%
Uppercase Letter
ValueCountFrequency (%)
S13274
 
8.6%
L12911
 
8.4%
T12497
 
8.1%
M9848
 
6.4%
A9117
 
5.9%
B9042
 
5.9%
I8703
 
5.6%
D8540
 
5.5%
C8444
 
5.5%
P7100
 
4.6%
Other values (37)54694
35.5%
Other Punctuation
ValueCountFrequency (%)
'6081
34.4%
.3927
22.2%
:3108
17.6%
,1949
 
11.0%
!1290
 
7.3%
&611
 
3.5%
?426
 
2.4%
/120
 
0.7%
#36
 
0.2%
¡27
 
0.2%
Other values (7)79
 
0.4%
Decimal Number
ValueCountFrequency (%)
21573
23.0%
11094
16.0%
01069
15.6%
3852
12.4%
4456
 
6.7%
9428
 
6.2%
7414
 
6.0%
5408
 
6.0%
6289
 
4.2%
8269
 
3.9%
Math Symbol
ValueCountFrequency (%)
+39
70.9%
=10
 
18.2%
~4
 
7.3%
×2
 
3.6%
Other Number
ValueCountFrequency (%)
½8
53.3%
²4
26.7%
³2
 
13.3%
¼1
 
6.7%
Close Punctuation
ValueCountFrequency (%)
)212
96.8%
]7
 
3.2%
Open Punctuation
ValueCountFrequency (%)
(210
96.8%
[7
 
3.2%
Other Letter
ValueCountFrequency (%)
ª18
64.3%
º10
35.7%
Currency Symbol
ValueCountFrequency (%)
$12
92.3%
£1
 
7.7%
Space Separator
ValueCountFrequency (%)
169758
100.0%
Dash Punctuation
ValueCountFrequency (%)
-5736
100.0%
Other Symbol
ValueCountFrequency (%)
°25
100.0%
Connector Punctuation
ValueCountFrequency (%)
_9
100.0%
Final Punctuation
ValueCountFrequency (%)
»2
100.0%
Initial Punctuation
ValueCountFrequency (%)
«2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1256696
86.2%
Common200557
 
13.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a131288
 
10.4%
e128227
 
10.2%
i100977
 
8.0%
o94336
 
7.5%
n81397
 
6.5%
r75148
 
6.0%
t63652
 
5.1%
l63426
 
5.0%
s54820
 
4.4%
d40255
 
3.2%
Other values (96)423170
33.7%
Common
ValueCountFrequency (%)
169758
84.6%
'6081
 
3.0%
-5736
 
2.9%
.3927
 
2.0%
:3108
 
1.5%
,1949
 
1.0%
21573
 
0.8%
!1290
 
0.6%
11094
 
0.5%
01069
 
0.5%
Other values (37)4972
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1447863
99.4%
None9390
 
0.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
169758
 
11.7%
a131288
 
9.1%
e128227
 
8.9%
i100977
 
7.0%
o94336
 
6.5%
n81397
 
5.6%
r75148
 
5.2%
t63652
 
4.4%
l63426
 
4.4%
s54820
 
3.8%
Other values (77)484834
33.5%
None
ValueCountFrequency (%)
é1055
 
11.2%
à798
 
8.5%
ô727
 
7.7%
è685
 
7.3%
ä636
 
6.8%
á619
 
6.6%
ü558
 
5.9%
í454
 
4.8%
ö442
 
4.7%
ó421
 
4.5%
Other values (56)2995
31.9%

original_title
Categorical

HIGH CARDINALITY
UNIFORM

Distinct80852
Distinct (%)94.2%
Missing0
Missing (%)0.0%
Memory size670.9 KiB
Anna
 
10
Home
 
8
The Three Musketeers
 
8
Darling
 
8
Solo
 
8
Other values (80847)
85813 

Length

Max length196
Median length92
Mean length15.72144895
Min length1

Characters and Unicode

Total characters1349765
Distinct characters155
Distinct categories16 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique77123 ?
Unique (%)89.8%

Sample

1st rowMiss Jerry
2nd rowThe Story of the Kelly Gang
3rd rowDen sorte drøm
4th rowCleopatra
5th rowL'Inferno

Common Values

ValueCountFrequency (%)
Anna10
 
< 0.1%
Home8
 
< 0.1%
The Three Musketeers8
 
< 0.1%
Darling8
 
< 0.1%
Solo8
 
< 0.1%
Inferno8
 
< 0.1%
Wanted8
 
< 0.1%
Blackout7
 
< 0.1%
Eden7
 
< 0.1%
Maya7
 
< 0.1%
Other values (80842)85776
99.9%

Length

2022-10-04T10:35:30.883853image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the13614
 
5.7%
of4287
 
1.8%
a2439
 
1.0%
la2198
 
0.9%
de1844
 
0.8%
in1749
 
0.7%
1418
 
0.6%
no1318
 
0.6%
and1302
 
0.5%
to1300
 
0.5%
Other values (62416)207330
86.8%

Most occurring characters

ValueCountFrequency (%)
152944
 
11.3%
e125860
 
9.3%
a110153
 
8.2%
i80503
 
6.0%
o79650
 
5.9%
n77197
 
5.7%
r68591
 
5.1%
t58303
 
4.3%
s53884
 
4.0%
l47826
 
3.5%
Other values (145)494854
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter998494
74.0%
Uppercase Letter172343
 
12.8%
Space Separator152944
 
11.3%
Other Punctuation15560
 
1.2%
Decimal Number6391
 
0.5%
Dash Punctuation3543
 
0.3%
Close Punctuation190
 
< 0.1%
Open Punctuation188
 
< 0.1%
Math Symbol56
 
< 0.1%
Currency Symbol21
 
< 0.1%
Other values (6)35
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e125860
12.6%
a110153
11.0%
i80503
 
8.1%
o79650
 
8.0%
n77197
 
7.7%
r68591
 
6.9%
t58303
 
5.8%
s53884
 
5.4%
l47826
 
4.8%
h41552
 
4.2%
Other values (47)254975
25.5%
Uppercase Letter
ValueCountFrequency (%)
T17828
 
10.3%
S15405
 
8.9%
M11837
 
6.9%
B10896
 
6.3%
L10420
 
6.0%
D10239
 
5.9%
A9828
 
5.7%
C9395
 
5.5%
P7673
 
4.5%
H7650
 
4.4%
Other values (37)61172
35.5%
Other Punctuation
ValueCountFrequency (%)
'4113
26.4%
:3563
22.9%
.3409
21.9%
,1901
12.2%
!1248
 
8.0%
&635
 
4.1%
?397
 
2.6%
/127
 
0.8%
#36
 
0.2%
¡34
 
0.2%
Other values (8)97
 
0.6%
Decimal Number
ValueCountFrequency (%)
21471
23.0%
11075
16.8%
0959
15.0%
3829
13.0%
4398
 
6.2%
9397
 
6.2%
5365
 
5.7%
7359
 
5.6%
6275
 
4.3%
8263
 
4.1%
Math Symbol
ValueCountFrequency (%)
+39
69.6%
=9
 
16.1%
~6
 
10.7%
×2
 
3.6%
Currency Symbol
ValueCountFrequency (%)
$18
85.7%
¢2
 
9.5%
£1
 
4.8%
Other Number
ValueCountFrequency (%)
½8
61.5%
²3
 
23.1%
³2
 
15.4%
Close Punctuation
ValueCountFrequency (%)
)180
94.7%
]10
 
5.3%
Open Punctuation
ValueCountFrequency (%)
(178
94.7%
[10
 
5.3%
Other Symbol
ValueCountFrequency (%)
°10
90.9%
®1
 
9.1%
Other Letter
ValueCountFrequency (%)
ª3
75.0%
º1
 
25.0%
Space Separator
ValueCountFrequency (%)
152944
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3543
100.0%
Connector Punctuation
ValueCountFrequency (%)
_5
100.0%
Initial Punctuation
ValueCountFrequency (%)
«1
100.0%
Final Punctuation
ValueCountFrequency (%)
»1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1170841
86.7%
Common178924
 
13.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e125860
 
10.7%
a110153
 
9.4%
i80503
 
6.9%
o79650
 
6.8%
n77197
 
6.6%
r68591
 
5.9%
t58303
 
5.0%
s53884
 
4.6%
l47826
 
4.1%
h41552
 
3.5%
Other values (96)427322
36.5%
Common
ValueCountFrequency (%)
152944
85.5%
'4113
 
2.3%
:3563
 
2.0%
-3543
 
2.0%
.3409
 
1.9%
,1901
 
1.1%
21471
 
0.8%
!1248
 
0.7%
11075
 
0.6%
0959
 
0.5%
Other values (39)4698
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1339467
99.2%
None10298
 
0.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
152944
 
11.4%
e125860
 
9.4%
a110153
 
8.2%
i80503
 
6.0%
o79650
 
5.9%
n77197
 
5.8%
r68591
 
5.1%
t58303
 
4.4%
s53884
 
4.0%
l47826
 
3.6%
Other values (77)484556
36.2%
None
ValueCountFrequency (%)
é1535
14.9%
ô1106
 
10.7%
ä789
 
7.7%
á728
 
7.1%
ü670
 
6.5%
í535
 
5.2%
ö530
 
5.1%
ó499
 
4.8%
è455
 
4.4%
û328
 
3.2%
Other values (58)3123
30.3%

year
Categorical

HIGH CARDINALITY

Distinct113
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size670.9 KiB
2017
 
3329
2018
 
3257
2016
 
3138
2015
 
2977
2014
 
2942
Other values (108)
70212 

Length

Max length13
Median length4
Mean length4.000104828
Min length4

Characters and Unicode

Total characters343429
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row1894
2nd row1906
3rd row1911
4th row1912
5th row1911

Common Values

ValueCountFrequency (%)
20173329
 
3.9%
20183257
 
3.8%
20163138
 
3.7%
20152977
 
3.5%
20142942
 
3.4%
20192841
 
3.3%
20132783
 
3.2%
20122560
 
3.0%
20112429
 
2.8%
20092298
 
2.7%
Other values (103)57301
66.7%

Length

2022-10-04T10:35:31.048868image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
20173329
 
3.9%
20183257
 
3.8%
20163138
 
3.7%
20152977
 
3.5%
20142942
 
3.4%
20192842
 
3.3%
20132783
 
3.2%
20122560
 
3.0%
20112429
 
2.8%
20092298
 
2.7%
Other values (104)57302
66.7%

Most occurring characters

ValueCountFrequency (%)
174835
21.8%
072686
21.2%
958164
16.9%
256117
16.3%
817090
 
5.0%
715889
 
4.6%
614146
 
4.1%
512623
 
3.7%
411258
 
3.3%
310612
 
3.1%
Other values (8)9
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number343420
> 99.9%
Lowercase Letter4
 
< 0.1%
Uppercase Letter3
 
< 0.1%
Space Separator2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
174835
21.8%
072686
21.2%
958164
16.9%
256117
16.3%
817090
 
5.0%
715889
 
4.6%
614146
 
4.1%
512623
 
3.7%
411258
 
3.3%
310612
 
3.1%
Lowercase Letter
ValueCountFrequency (%)
o1
25.0%
v1
25.0%
i1
25.0%
e1
25.0%
Uppercase Letter
ValueCountFrequency (%)
T1
33.3%
V1
33.3%
M1
33.3%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common343422
> 99.9%
Latin7
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
174835
21.8%
072686
21.2%
958164
16.9%
256117
16.3%
817090
 
5.0%
715889
 
4.6%
614146
 
4.1%
512623
 
3.7%
411258
 
3.3%
310612
 
3.1%
Latin
ValueCountFrequency (%)
T1
14.3%
V1
14.3%
M1
14.3%
o1
14.3%
v1
14.3%
i1
14.3%
e1
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII343429
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
174835
21.8%
072686
21.2%
958164
16.9%
256117
16.3%
817090
 
5.0%
715889
 
4.6%
614146
 
4.1%
512623
 
3.7%
411258
 
3.3%
310612
 
3.1%
Other values (8)9
 
< 0.1%

date_published
Categorical

HIGH CARDINALITY

Distinct22012
Distinct (%)25.6%
Missing0
Missing (%)0.0%
Memory size670.9 KiB
2010
 
113
2008
 
106
1997
 
100
1999
 
99
2009
 
96
Other values (22007)
85341 

Length

Max length13
Median length10
Mean length9.681218333
Min length4

Characters and Unicode

Total characters831181
Distinct characters19
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8693 ?
Unique (%)10.1%

Sample

1st row1894-10-09
2nd row1906-12-26
3rd row1911-08-19
4th row1912-11-13
5th row1911-03-06

Common Values

ValueCountFrequency (%)
2010113
 
0.1%
2008106
 
0.1%
1997100
 
0.1%
199999
 
0.1%
200996
 
0.1%
198591
 
0.1%
199690
 
0.1%
197588
 
0.1%
201188
 
0.1%
198387
 
0.1%
Other values (22002)84897
98.9%

Length

2022-10-04T10:35:31.191557image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2010113
 
0.1%
2008106
 
0.1%
1997100
 
0.1%
199999
 
0.1%
200996
 
0.1%
198591
 
0.1%
199690
 
0.1%
197588
 
0.1%
201188
 
0.1%
198387
 
0.1%
Other values (22003)84899
98.9%

Most occurring characters

ValueCountFrequency (%)
0175452
21.1%
-162584
19.6%
1149685
18.0%
2102736
12.4%
972450
8.7%
830994
 
3.7%
329311
 
3.5%
728518
 
3.4%
627158
 
3.3%
526751
 
3.2%
Other values (9)25542
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number668588
80.4%
Dash Punctuation162584
 
19.6%
Lowercase Letter4
 
< 0.1%
Uppercase Letter3
 
< 0.1%
Space Separator2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0175452
26.2%
1149685
22.4%
2102736
15.4%
972450
10.8%
830994
 
4.6%
329311
 
4.4%
728518
 
4.3%
627158
 
4.1%
526751
 
4.0%
425533
 
3.8%
Lowercase Letter
ValueCountFrequency (%)
o1
25.0%
v1
25.0%
i1
25.0%
e1
25.0%
Uppercase Letter
ValueCountFrequency (%)
T1
33.3%
V1
33.3%
M1
33.3%
Dash Punctuation
ValueCountFrequency (%)
-162584
100.0%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common831174
> 99.9%
Latin7
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0175452
21.1%
-162584
19.6%
1149685
18.0%
2102736
12.4%
972450
8.7%
830994
 
3.7%
329311
 
3.5%
728518
 
3.4%
627158
 
3.3%
526751
 
3.2%
Other values (2)25535
 
3.1%
Latin
ValueCountFrequency (%)
T1
14.3%
V1
14.3%
M1
14.3%
o1
14.3%
v1
14.3%
i1
14.3%
e1
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII831181
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0175452
21.1%
-162584
19.6%
1149685
18.0%
2102736
12.4%
972450
8.7%
830994
 
3.7%
329311
 
3.5%
728518
 
3.4%
627158
 
3.3%
526751
 
3.2%
Other values (9)25542
 
3.1%

genre
Categorical

HIGH CARDINALITY

Distinct1257
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size670.9 KiB
Drama
12543 
Comedy
7693 
Comedy, Drama
 
4039
Drama, Romance
 
3455
Comedy, Romance
 
2508
Other values (1252)
55617 

Length

Max length31
Median length26
Mean length14.64983985
Min length3

Characters and Unicode

Total characters1257762
Distinct characters35
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique385 ?
Unique (%)0.4%

Sample

1st rowRomance
2nd rowBiography, Crime, Drama
3rd rowDrama
4th rowDrama, History
5th rowAdventure, Drama, Fantasy

Common Values

ValueCountFrequency (%)
Drama12543
 
14.6%
Comedy7693
 
9.0%
Comedy, Drama4039
 
4.7%
Drama, Romance3455
 
4.0%
Comedy, Romance2508
 
2.9%
Comedy, Drama, Romance2293
 
2.7%
Horror2268
 
2.6%
Drama, Thriller1348
 
1.6%
Crime, Drama1343
 
1.6%
Action, Crime, Drama1310
 
1.5%
Other values (1247)47055
54.8%

Length

2022-10-04T10:35:31.346477image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drama47110
26.8%
comedy29368
16.7%
romance14128
 
8.0%
action12948
 
7.4%
thriller11388
 
6.5%
crime11067
 
6.3%
horror9557
 
5.4%
adventure7590
 
4.3%
mystery5225
 
3.0%
family3962
 
2.3%
Other values (15)23524
13.4%

Most occurring characters

ValueCountFrequency (%)
r132666
 
10.5%
a128740
 
10.2%
m108441
 
8.6%
,90012
 
7.2%
90012
 
7.2%
e89528
 
7.1%
o84101
 
6.7%
i60595
 
4.8%
y52270
 
4.2%
D47112
 
3.7%
Other values (25)374285
29.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter893320
71.0%
Uppercase Letter180144
 
14.3%
Other Punctuation90012
 
7.2%
Space Separator90012
 
7.2%
Dash Punctuation4274
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r132666
14.9%
a128740
14.4%
m108441
12.1%
e89528
10.0%
o84101
9.4%
i60595
6.8%
y52270
 
5.9%
n44345
 
5.0%
d36960
 
4.1%
t36666
 
4.1%
Other values (9)119008
13.3%
Uppercase Letter
ValueCountFrequency (%)
D47112
26.2%
C40435
22.4%
A22681
12.6%
R14131
 
7.8%
F12045
 
6.7%
H11853
 
6.6%
T11391
 
6.3%
M8955
 
5.0%
S4672
 
2.6%
W3825
 
2.1%
Other values (3)3044
 
1.7%
Other Punctuation
ValueCountFrequency (%)
,90012
100.0%
Space Separator
ValueCountFrequency (%)
90012
100.0%
Dash Punctuation
ValueCountFrequency (%)
-4274
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1073464
85.3%
Common184298
 
14.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
r132666
12.4%
a128740
12.0%
m108441
 
10.1%
e89528
 
8.3%
o84101
 
7.8%
i60595
 
5.6%
y52270
 
4.9%
D47112
 
4.4%
n44345
 
4.1%
C40435
 
3.8%
Other values (22)285231
26.6%
Common
ValueCountFrequency (%)
,90012
48.8%
90012
48.8%
-4274
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1257762
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r132666
 
10.5%
a128740
 
10.2%
m108441
 
8.6%
,90012
 
7.2%
90012
 
7.2%
e89528
 
7.1%
o84101
 
6.7%
i60595
 
4.8%
y52270
 
4.2%
D47112
 
3.7%
Other values (25)374285
29.8%

duration
Real number (ℝ≥0)

Distinct266
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.3514181
Minimum41
Maximum808
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size670.9 KiB
2022-10-04T10:35:31.486430image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum41
5-th percentile73
Q188
median96
Q3108
95-th percentile142
Maximum808
Range767
Interquartile range (IQR)20

Descriptive statistics

Standard deviation22.55384799
Coefficient of variation (CV)0.2247486724
Kurtosis40.30157626
Mean100.3514181
Median Absolute Deviation (MAD)10
Skewness3.079705205
Sum8615671
Variance508.6760589
MonotonicityNot monotonic
2022-10-04T10:35:31.643418image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
905162
 
6.0%
953194
 
3.7%
1003106
 
3.6%
922418
 
2.8%
932414
 
2.8%
852308
 
2.7%
882228
 
2.6%
942193
 
2.6%
962177
 
2.5%
912132
 
2.5%
Other values (256)58523
68.2%
ValueCountFrequency (%)
411
 
< 0.1%
421
 
< 0.1%
431
 
< 0.1%
441
 
< 0.1%
4562
0.1%
4626
 
< 0.1%
4726
 
< 0.1%
4836
< 0.1%
4919
 
< 0.1%
5072
0.1%
ValueCountFrequency (%)
8081
 
< 0.1%
7291
 
< 0.1%
5801
 
< 0.1%
5701
 
< 0.1%
5403
< 0.1%
4851
 
< 0.1%
4501
 
< 0.1%
4421
 
< 0.1%
4391
 
< 0.1%
4211
 
< 0.1%

country
Categorical

HIGH CARDINALITY

Distinct4907
Distinct (%)5.7%
Missing64
Missing (%)0.1%
Memory size670.9 KiB
USA
28511 
India
6065 
UK
4111 
Japan
 
3077
France
 
3055
Other values (4902)
40972 

Length

Max length225
Median length110
Mean length7.24057302
Min length2

Characters and Unicode

Total characters621176
Distinct characters58
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3614 ?
Unique (%)4.2%

Sample

1st rowUSA
2nd rowAustralia
3rd rowGermany, Denmark
4th rowUSA
5th rowItaly

Common Values

ValueCountFrequency (%)
USA28511
33.2%
India6065
 
7.1%
UK4111
 
4.8%
Japan3077
 
3.6%
France3055
 
3.6%
Italy2444
 
2.8%
Canada1802
 
2.1%
Germany1396
 
1.6%
Turkey1351
 
1.6%
Hong Kong1239
 
1.4%
Other values (4897)32740
38.1%

Length

2022-10-04T10:35:31.824933image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
usa34325
29.6%
france8311
 
7.2%
uk7490
 
6.4%
india6373
 
5.5%
italy5056
 
4.4%
germany4909
 
4.2%
japan3701
 
3.2%
canada3621
 
3.1%
spain2731
 
2.4%
hong1884
 
1.6%
Other values (211)37739
32.5%

Most occurring characters

ValueCountFrequency (%)
a72788
 
11.7%
n50990
 
8.2%
U42996
 
6.9%
S42087
 
6.8%
A37428
 
6.0%
e36290
 
5.8%
30349
 
4.9%
r29118
 
4.7%
i28551
 
4.6%
,22991
 
3.7%
Other values (48)227588
36.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter375790
60.5%
Uppercase Letter192037
30.9%
Space Separator30349
 
4.9%
Other Punctuation22997
 
3.7%
Open Punctuation1
 
< 0.1%
Close Punctuation1
 
< 0.1%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a72788
19.4%
n50990
13.6%
e36290
9.7%
r29118
 
7.7%
i28551
 
7.6%
l17254
 
4.6%
d16614
 
4.4%
o16082
 
4.3%
t15164
 
4.0%
y13552
 
3.6%
Other values (17)79387
21.1%
Uppercase Letter
ValueCountFrequency (%)
U42996
22.4%
S42087
21.9%
A37428
19.5%
I13509
 
7.0%
K10777
 
5.6%
F9079
 
4.7%
C6333
 
3.3%
G5771
 
3.0%
J3742
 
1.9%
B2891
 
1.5%
Other values (15)17424
9.1%
Other Punctuation
ValueCountFrequency (%)
,22991
> 99.9%
'6
 
< 0.1%
Space Separator
ValueCountFrequency (%)
30349
100.0%
Open Punctuation
ValueCountFrequency (%)
(1
100.0%
Close Punctuation
ValueCountFrequency (%)
)1
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin567827
91.4%
Common53349
 
8.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a72788
 
12.8%
n50990
 
9.0%
U42996
 
7.6%
S42087
 
7.4%
A37428
 
6.6%
e36290
 
6.4%
r29118
 
5.1%
i28551
 
5.0%
l17254
 
3.0%
d16614
 
2.9%
Other values (42)193711
34.1%
Common
ValueCountFrequency (%)
30349
56.9%
,22991
43.1%
'6
 
< 0.1%
(1
 
< 0.1%
)1
 
< 0.1%
-1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII621170
> 99.9%
None6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a72788
 
11.7%
n50990
 
8.2%
U42996
 
6.9%
S42087
 
6.8%
A37428
 
6.0%
e36290
 
5.8%
30349
 
4.9%
r29118
 
4.7%
i28551
 
4.6%
,22991
 
3.7%
Other values (47)227582
36.6%
None
ValueCountFrequency (%)
ô6
100.0%

language
Categorical

HIGH CARDINALITY

Distinct4377
Distinct (%)5.1%
Missing833
Missing (%)1.0%
Memory size670.9 KiB
English
35939 
French
3903 
Spanish
 
2831
Japanese
 
2826
Italian
 
2731
Other values (4372)
36792 

Length

Max length163
Median length7
Mean length9.476206158
Min length3

Characters and Unicode

Total characters805686
Distinct characters61
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3175 ?
Unique (%)3.7%

Sample

1st rowNone
2nd rowNone
3rd rowEnglish
4th rowItalian
5th rowEnglish

Common Values

ValueCountFrequency (%)
English35939
41.9%
French3903
 
4.5%
Spanish2831
 
3.3%
Japanese2826
 
3.3%
Italian2731
 
3.2%
Hindi2106
 
2.5%
German1761
 
2.1%
Turkish1355
 
1.6%
Russian1345
 
1.6%
English, Spanish1108
 
1.3%
Other values (4367)29117
33.9%

Length

2022-10-04T10:35:31.996822image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
english47453
43.2%
french8164
 
7.4%
spanish5685
 
5.2%
italian4677
 
4.3%
german4606
 
4.2%
japanese3888
 
3.5%
hindi2949
 
2.7%
russian2816
 
2.6%
mandarin1946
 
1.8%
turkish1666
 
1.5%
Other values (258)25947
23.6%

Most occurring characters

ValueCountFrequency (%)
n101920
12.7%
i86785
10.8%
s73457
 
9.1%
h69711
 
8.7%
l59572
 
7.4%
a57434
 
7.1%
g52934
 
6.6%
E47597
 
5.9%
e38510
 
4.8%
r26491
 
3.3%
Other values (51)191275
23.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter646212
80.2%
Uppercase Letter110089
 
13.7%
Space Separator24775
 
3.1%
Other Punctuation24199
 
3.0%
Dash Punctuation333
 
< 0.1%
Decimal Number40
 
< 0.1%
Open Punctuation19
 
< 0.1%
Close Punctuation19
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n101920
15.8%
i86785
13.4%
s73457
11.4%
h69711
10.8%
l59572
9.2%
a57434
8.9%
g52934
8.2%
e38510
 
6.0%
r26491
 
4.1%
u11995
 
1.9%
Other values (16)67403
10.4%
Uppercase Letter
ValueCountFrequency (%)
E47597
43.2%
F9134
 
8.3%
S8151
 
7.4%
G5680
 
5.2%
I5199
 
4.7%
T4491
 
4.1%
H4102
 
3.7%
J3888
 
3.5%
M3414
 
3.1%
P3353
 
3.0%
Other values (16)15080
 
13.7%
Decimal Number
ValueCountFrequency (%)
110
25.0%
410
25.0%
510
25.0%
310
25.0%
Space Separator
ValueCountFrequency (%)
24775
100.0%
Other Punctuation
ValueCountFrequency (%)
,24199
100.0%
Dash Punctuation
ValueCountFrequency (%)
-333
100.0%
Open Punctuation
ValueCountFrequency (%)
(19
100.0%
Close Punctuation
ValueCountFrequency (%)
)19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin756301
93.9%
Common49385
 
6.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n101920
13.5%
i86785
11.5%
s73457
9.7%
h69711
9.2%
l59572
7.9%
a57434
7.6%
g52934
 
7.0%
E47597
 
6.3%
e38510
 
5.1%
r26491
 
3.5%
Other values (42)141890
18.8%
Common
ValueCountFrequency (%)
24775
50.2%
,24199
49.0%
-333
 
0.7%
(19
 
< 0.1%
)19
 
< 0.1%
110
 
< 0.1%
410
 
< 0.1%
510
 
< 0.1%
310
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII805686
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n101920
12.7%
i86785
10.8%
s73457
 
9.1%
h69711
 
8.7%
l59572
 
7.4%
a57434
 
7.1%
g52934
 
6.6%
E47597
 
5.9%
e38510
 
4.8%
r26491
 
3.3%
Other values (51)191275
23.7%

director
Categorical

HIGH CARDINALITY

Distinct34733
Distinct (%)40.5%
Missing87
Missing (%)0.1%
Memory size670.9 KiB
Jesús Franco
 
87
Michael Curtiz
 
85
Lesley Selander
 
78
Lloyd Bacon
 
73
William Beaudine
 
70
Other values (34728)
85375 

Length

Max length62
Median length52
Mean length14.65699328
Min length2

Characters and Unicode

Total characters1257101
Distinct characters105
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21465 ?
Unique (%)25.0%

Sample

1st rowAlexander Black
2nd rowCharles Tait
3rd rowUrban Gad
4th rowCharles L. Gaskill
5th rowFrancesco Bertolini, Adolfo Padovan

Common Values

ValueCountFrequency (%)
Jesús Franco87
 
0.1%
Michael Curtiz85
 
0.1%
Lesley Selander78
 
0.1%
Lloyd Bacon73
 
0.1%
William Beaudine70
 
0.1%
Richard Thorpe68
 
0.1%
John Ford67
 
0.1%
Gordon Douglas64
 
0.1%
Raoul Walsh61
 
0.1%
Mervyn LeRoy59
 
0.1%
Other values (34723)85056
99.1%
(Missing)87
 
0.1%

Length

2022-10-04T10:35:32.243646image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john1672
 
0.9%
david1319
 
0.7%
michael1316
 
0.7%
robert1196
 
0.6%
william913
 
0.5%
richard847
 
0.4%
peter741
 
0.4%
de736
 
0.4%
james721
 
0.4%
paul703
 
0.4%
Other values (31305)182945
94.7%

Most occurring characters

ValueCountFrequency (%)
a118249
 
9.4%
107341
 
8.5%
e101071
 
8.0%
i83255
 
6.6%
n82271
 
6.5%
r81904
 
6.5%
o71171
 
5.7%
l54291
 
4.3%
s43948
 
3.5%
t39334
 
3.1%
Other values (95)474266
37.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter936765
74.5%
Uppercase Letter196738
 
15.7%
Space Separator107341
 
8.5%
Other Punctuation13124
 
1.0%
Dash Punctuation3132
 
0.2%
Decimal Number1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a118249
12.6%
e101071
10.8%
i83255
 
8.9%
n82271
 
8.8%
r81904
 
8.7%
o71171
 
7.6%
l54291
 
5.8%
s43948
 
4.7%
t39334
 
4.2%
h36152
 
3.9%
Other values (46)225119
24.0%
Uppercase Letter
ValueCountFrequency (%)
S17202
 
8.7%
M17033
 
8.7%
J13224
 
6.7%
A13007
 
6.6%
R12412
 
6.3%
C12100
 
6.2%
B11788
 
6.0%
D10144
 
5.2%
L9622
 
4.9%
G9376
 
4.8%
Other values (32)70830
36.0%
Other Punctuation
ValueCountFrequency (%)
.6689
51.0%
,5826
44.4%
'607
 
4.6%
"2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
107341
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3132
100.0%
Decimal Number
ValueCountFrequency (%)
31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1133503
90.2%
Common123598
 
9.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a118249
 
10.4%
e101071
 
8.9%
i83255
 
7.3%
n82271
 
7.3%
r81904
 
7.2%
o71171
 
6.3%
l54291
 
4.8%
s43948
 
3.9%
t39334
 
3.5%
h36152
 
3.2%
Other values (88)421857
37.2%
Common
ValueCountFrequency (%)
107341
86.8%
.6689
 
5.4%
,5826
 
4.7%
-3132
 
2.5%
'607
 
0.5%
"2
 
< 0.1%
31
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1248298
99.3%
None8803
 
0.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a118249
 
9.5%
107341
 
8.6%
e101071
 
8.1%
i83255
 
6.7%
n82271
 
6.6%
r81904
 
6.6%
o71171
 
5.7%
l54291
 
4.3%
s43948
 
3.5%
t39334
 
3.2%
Other values (49)465463
37.3%
None
ValueCountFrequency (%)
é2084
23.7%
á1156
13.1%
ô709
 
8.1%
í648
 
7.4%
ó587
 
6.7%
ö554
 
6.3%
ü499
 
5.7%
ç300
 
3.4%
ä212
 
2.4%
Ö175
 
2.0%
Other values (36)1879
21.3%

writer
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct66859
Distinct (%)79.3%
Missing1572
Missing (%)1.8%
Memory size670.9 KiB
Jing Wong
 
84
Kuang Ni
 
45
Woody Allen
 
40
Erdogan Tünas
 
35
Leonardo Benvenuti, Piero De Bernardi
 
34
Other values (66854)
84045 

Length

Max length64
Median length52
Mean length24.00411708
Min length2

Characters and Unicode

Total characters2023139
Distinct characters113
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique58034 ?
Unique (%)68.9%

Sample

1st rowAlexander Black
2nd rowCharles Tait
3rd rowUrban Gad, Gebhard Schätzler-Perasini
4th rowVictorien Sardou
5th rowDante Alighieri

Common Values

ValueCountFrequency (%)
Jing Wong84
 
0.1%
Kuang Ni45
 
0.1%
Woody Allen40
 
< 0.1%
Erdogan Tünas35
 
< 0.1%
Leonardo Benvenuti, Piero De Bernardi34
 
< 0.1%
Carlo Vanzina, Enrico Vanzina32
 
< 0.1%
Cheh Chang, Kuang Ni31
 
< 0.1%
Giannis Dalianidis29
 
< 0.1%
Ingmar Bergman27
 
< 0.1%
Safa Önal27
 
< 0.1%
Other values (66849)83899
97.7%
(Missing)1572
 
1.8%

Length

2022-10-04T10:35:32.446977image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john2418
 
0.8%
david1853
 
0.6%
robert1843
 
0.6%
michael1772
 
0.6%
james1215
 
0.4%
paul1135
 
0.4%
de1111
 
0.4%
richard1072
 
0.4%
william1008
 
0.3%
peter924
 
0.3%
Other values (48157)280912
95.1%

Most occurring characters

ValueCountFrequency (%)
210980
 
10.4%
a183777
 
9.1%
e156493
 
7.7%
n128306
 
6.3%
r126376
 
6.2%
i124811
 
6.2%
o109484
 
5.4%
l83330
 
4.1%
s68336
 
3.4%
t61306
 
3.0%
Other values (103)769940
38.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1438844
71.1%
Uppercase Letter301943
 
14.9%
Space Separator210980
 
10.4%
Other Punctuation66855
 
3.3%
Dash Punctuation4503
 
0.2%
Decimal Number14
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a183777
12.8%
e156493
10.9%
n128306
 
8.9%
r126376
 
8.8%
i124811
 
8.7%
o109484
 
7.6%
l83330
 
5.8%
s68336
 
4.7%
t61306
 
4.3%
h54751
 
3.8%
Other values (48)341874
23.8%
Uppercase Letter
ValueCountFrequency (%)
M26301
 
8.7%
S24660
 
8.2%
J20870
 
6.9%
A20564
 
6.8%
B20027
 
6.6%
C19493
 
6.5%
R17550
 
5.8%
D15814
 
5.2%
G14909
 
4.9%
K14486
 
4.8%
Other values (32)107269
35.5%
Decimal Number
ValueCountFrequency (%)
04
28.6%
53
21.4%
72
14.3%
12
14.3%
32
14.3%
91
 
7.1%
Other Punctuation
ValueCountFrequency (%)
,55238
82.6%
.10608
 
15.9%
'1006
 
1.5%
"2
 
< 0.1%
&1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
210980
100.0%
Dash Punctuation
ValueCountFrequency (%)
-4503
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1740787
86.0%
Common282352
 
14.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a183777
 
10.6%
e156493
 
9.0%
n128306
 
7.4%
r126376
 
7.3%
i124811
 
7.2%
o109484
 
6.3%
l83330
 
4.8%
s68336
 
3.9%
t61306
 
3.5%
h54751
 
3.1%
Other values (90)643817
37.0%
Common
ValueCountFrequency (%)
210980
74.7%
,55238
 
19.6%
.10608
 
3.8%
-4503
 
1.6%
'1006
 
0.4%
04
 
< 0.1%
53
 
< 0.1%
72
 
< 0.1%
12
 
< 0.1%
"2
 
< 0.1%
Other values (3)4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2009960
99.3%
None13179
 
0.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
210980
 
10.5%
a183777
 
9.1%
e156493
 
7.8%
n128306
 
6.4%
r126376
 
6.3%
i124811
 
6.2%
o109484
 
5.4%
l83330
 
4.1%
s68336
 
3.4%
t61306
 
3.1%
Other values (55)756761
37.7%
None
ValueCountFrequency (%)
é2927
22.2%
á1798
13.6%
ô1175
8.9%
í1019
 
7.7%
ó876
 
6.6%
ü795
 
6.0%
ö730
 
5.5%
ç448
 
3.4%
è380
 
2.9%
ä340
 
2.6%
Other values (38)2691
20.4%

production_company
Categorical

HIGH CARDINALITY
MISSING

Distinct32050
Distinct (%)39.4%
Missing4455
Missing (%)5.2%
Memory size670.9 KiB
Metro-Goldwyn-Mayer (MGM)
 
1284
Warner Bros.
 
1153
Columbia Pictures
 
914
Paramount Pictures
 
903
Twentieth Century Fox
 
865
Other values (32045)
76281 

Length

Max length101
Median length75
Mean length18.26003686
Min length1

Characters and Unicode

Total characters1486367
Distinct characters129
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22126 ?
Unique (%)27.2%

Sample

1st rowAlexander Black Photoplays
2nd rowJ. and N. Tait
3rd rowFotorama
4th rowHelen Gardner Picture Players
5th rowMilano Film

Common Values

ValueCountFrequency (%)
Metro-Goldwyn-Mayer (MGM)1284
 
1.5%
Warner Bros.1153
 
1.3%
Columbia Pictures914
 
1.1%
Paramount Pictures903
 
1.1%
Twentieth Century Fox865
 
1.0%
Universal Pictures732
 
0.9%
RKO Radio Pictures535
 
0.6%
Mosfilm279
 
0.3%
Universal International Pictures (UI)272
 
0.3%
Canal+231
 
0.3%
Other values (32040)74232
86.5%
(Missing)4455
 
5.2%

Length

2022-10-04T10:35:32.633250image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
films11429
 
5.6%
productions11148
 
5.5%
pictures9826
 
4.9%
film9384
 
4.6%
entertainment5027
 
2.5%
company1990
 
1.0%
international1676
 
0.8%
production1431
 
0.7%
metro-goldwyn-mayer1309
 
0.6%
mgm1285
 
0.6%
Other values (25140)148036
73.1%

Most occurring characters

ValueCountFrequency (%)
i123215
 
8.3%
121141
 
8.2%
e100096
 
6.7%
n96877
 
6.5%
o96222
 
6.5%
r91195
 
6.1%
t90440
 
6.1%
a87994
 
5.9%
s71570
 
4.8%
l60630
 
4.1%
Other values (119)546987
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1107193
74.5%
Uppercase Letter226962
 
15.3%
Space Separator121141
 
8.2%
Other Punctuation11473
 
0.8%
Dash Punctuation5272
 
0.4%
Open Punctuation4738
 
0.3%
Close Punctuation4737
 
0.3%
Decimal Number4464
 
0.3%
Math Symbol365
 
< 0.1%
Connector Punctuation12
 
< 0.1%
Other values (2)10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i123215
11.1%
e100096
9.0%
n96877
8.7%
o96222
8.7%
r91195
 
8.2%
t90440
 
8.2%
a87994
 
7.9%
s71570
 
6.5%
l60630
 
5.5%
m53032
 
4.8%
Other values (45)235922
21.3%
Uppercase Letter
ValueCountFrequency (%)
F32415
14.3%
P32019
14.1%
C22473
 
9.9%
M15789
 
7.0%
A13965
 
6.2%
S13249
 
5.8%
B10674
 
4.7%
E10339
 
4.6%
G9267
 
4.1%
I8393
 
3.7%
Other values (28)58379
25.7%
Other Punctuation
ValueCountFrequency (%)
.8575
74.7%
&1258
 
11.0%
'625
 
5.4%
/495
 
4.3%
"230
 
2.0%
,192
 
1.7%
!55
 
0.5%
:17
 
0.1%
@8
 
0.1%
%6
 
0.1%
Other values (3)12
 
0.1%
Decimal Number
ValueCountFrequency (%)
2928
20.8%
0775
17.4%
1679
15.2%
3541
12.1%
4490
11.0%
5260
 
5.8%
7248
 
5.6%
9211
 
4.7%
8198
 
4.4%
6134
 
3.0%
Math Symbol
ValueCountFrequency (%)
+363
99.5%
~1
 
0.3%
=1
 
0.3%
Open Punctuation
ValueCountFrequency (%)
(4735
99.9%
[3
 
0.1%
Close Punctuation
ValueCountFrequency (%)
)4734
99.9%
]3
 
0.1%
Other Number
ValueCountFrequency (%)
²4
80.0%
½1
 
20.0%
Space Separator
ValueCountFrequency (%)
121141
100.0%
Dash Punctuation
ValueCountFrequency (%)
-5272
100.0%
Connector Punctuation
ValueCountFrequency (%)
_12
100.0%
Other Symbol
ValueCountFrequency (%)
°5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1334155
89.8%
Common152212
 
10.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
i123215
 
9.2%
e100096
 
7.5%
n96877
 
7.3%
o96222
 
7.2%
r91195
 
6.8%
t90440
 
6.8%
a87994
 
6.6%
s71570
 
5.4%
l60630
 
4.5%
m53032
 
4.0%
Other values (83)462884
34.7%
Common
ValueCountFrequency (%)
121141
79.6%
.8575
 
5.6%
-5272
 
3.5%
(4735
 
3.1%
)4734
 
3.1%
&1258
 
0.8%
2928
 
0.6%
0775
 
0.5%
1679
 
0.4%
'625
 
0.4%
Other values (26)3490
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1480947
99.6%
None5420
 
0.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i123215
 
8.3%
121141
 
8.2%
e100096
 
6.8%
n96877
 
6.5%
o96222
 
6.5%
r91195
 
6.2%
t90440
 
6.1%
a87994
 
5.9%
s71570
 
4.8%
l60630
 
4.1%
Other values (75)541567
36.6%
None
ValueCountFrequency (%)
é2063
38.1%
á708
 
13.1%
ó576
 
10.6%
í256
 
4.7%
ç253
 
4.7%
ü251
 
4.6%
ú176
 
3.2%
ñ147
 
2.7%
õ142
 
2.6%
ö134
 
2.5%
Other values (34)714
 
13.2%

actors
Categorical

HIGH CARDINALITY
UNIFORM

Distinct85729
Distinct (%)99.9%
Missing69
Missing (%)0.1%
Memory size670.9 KiB
Nobuyo Ôyama, Noriko Ohara, Michiko Nomura, Kaneta Kimotsuki, Kazuya Tatekabe
 
9
Sergey A.
 
6
Bill Corbett, Kevin Murphy, Michael J. Nelson
 
6
Keiji Fujiwara, Satomi Kôrogi, Miki Narahashi, Akiko Yajima
 
4
Trace Beaulieu, Frank Conniff, Joel Hodgson, Mary Jo Pehl, J. Elvis Weinstein
 
3
Other values (85724)
85758 

Length

Max length415
Median length312
Mean length205.166892
Min length7

Characters and Unicode

Total characters17600447
Distinct characters131
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique85693 ?
Unique (%)99.9%

Sample

1st rowBlanche Bayliss, William Courtenay, Chauncey Depew
2nd rowElizabeth Tait, John Tait, Norman Campbell, Bella Cola, Will Coyne, Sam Crewes, Jack Ennis, John Forde, Vera Linden, Mr. Marshall, Mr. McKenzie, Frank Mills, Ollie Wilson
3rd rowAsta Nielsen, Valdemar Psilander, Gunnar Helsengreen, Emil Albes, Hugo Flink, Mary Hagen
4th rowHelen Gardner, Pearl Sindelar, Miss Fielding, Miss Robson, Helene Costello, Charles Sindelar, Mr. Howard, James R. Waite, Mr. Osborne, Harry Knowles, Mr. Paul, Mr. Brady, Mr. Corker
5th rowSalvatore Papa, Arturo Pirovano, Giuseppe de Liguoro, Pier Delle Vigne, Augusto Milla, Attilio Motta, Emilise Beretta

Common Values

ValueCountFrequency (%)
Nobuyo Ôyama, Noriko Ohara, Michiko Nomura, Kaneta Kimotsuki, Kazuya Tatekabe9
 
< 0.1%
Sergey A.6
 
< 0.1%
Bill Corbett, Kevin Murphy, Michael J. Nelson6
 
< 0.1%
Keiji Fujiwara, Satomi Kôrogi, Miki Narahashi, Akiko Yajima4
 
< 0.1%
Trace Beaulieu, Frank Conniff, Joel Hodgson, Mary Jo Pehl, J. Elvis Weinstein3
 
< 0.1%
Richard Pryor3
 
< 0.1%
Ian McKellen, Martin Freeman, Richard Armitage, Ken Stott, Graham McTavish, William Kircher, James Nesbitt, Stephen Hunter, Dean O'Gorman, Aidan Turner, John Callen, Peter Hambleton, Jed Brophy, Mark Hadlow, Adam Brown3
 
< 0.1%
Mike Stoklasa3
 
< 0.1%
Tomoki Hirose, Yûki Hiyori, Rin Ishikawa, Itsuki Sagara, Yukihiro Takiguchi, Hinako Tanaka, James Takeshi Yamada2
 
< 0.1%
H.B. Halicki, Marion Busia, Jerry Daugirda, James McIntyre, George Cole, Ronald Halicki, Markos Kotsikos, Parnelli Jones, Gary Bettenhausen, Jonathan E. Fricke, Hal McClain, J.C. Agajanian, J.C. Agajanian Jr., Christopher J.C. Agajanian, Billy Englehart2
 
< 0.1%
Other values (85719)85745
99.9%
(Missing)69
 
0.1%

Length

2022-10-04T10:35:32.824673image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john13834
 
0.6%
michael11431
 
0.5%
david9725
 
0.4%
robert8917
 
0.4%
james8225
 
0.3%
de7155
 
0.3%
richard6857
 
0.3%
paul6811
 
0.3%
lee6235
 
0.3%
peter6143
 
0.3%
Other values (209566)2294434
96.4%

Most occurring characters

ValueCountFrequency (%)
2293981
 
13.0%
a1593877
 
9.1%
e1283545
 
7.3%
,1069507
 
6.1%
n1063343
 
6.0%
i1048544
 
6.0%
r996016
 
5.7%
o852330
 
4.8%
l708858
 
4.0%
s533575
 
3.0%
Other values (121)6156871
35.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter11731527
66.7%
Uppercase Letter2427273
 
13.8%
Space Separator2293981
 
13.0%
Other Punctuation1111873
 
6.3%
Dash Punctuation35668
 
0.2%
Decimal Number122
 
< 0.1%
Currency Symbol1
 
< 0.1%
Final Punctuation1
 
< 0.1%
Other Letter1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1593877
13.6%
e1283545
10.9%
n1063343
 
9.1%
i1048544
 
8.9%
r996016
 
8.5%
o852330
 
7.3%
l708858
 
6.0%
s533575
 
4.5%
t502791
 
4.3%
h425908
 
3.6%
Other values (48)2722740
23.2%
Uppercase Letter
ValueCountFrequency (%)
M223504
 
9.2%
S197996
 
8.2%
A166141
 
6.8%
B165554
 
6.8%
C164832
 
6.8%
J151829
 
6.3%
R133317
 
5.5%
D128135
 
5.3%
L121460
 
5.0%
K120119
 
4.9%
Other values (39)854386
35.2%
Decimal Number
ValueCountFrequency (%)
028
23.0%
522
18.0%
121
17.2%
213
10.7%
412
9.8%
68
 
6.6%
36
 
4.9%
95
 
4.1%
74
 
3.3%
83
 
2.5%
Other Punctuation
ValueCountFrequency (%)
,1069507
96.2%
.30041
 
2.7%
'12254
 
1.1%
&30
 
< 0.1%
!16
 
< 0.1%
"12
 
< 0.1%
:10
 
< 0.1%
*2
 
< 0.1%
@1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2293981
100.0%
Dash Punctuation
ValueCountFrequency (%)
-35668
100.0%
Currency Symbol
ValueCountFrequency (%)
$1
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%
Other Letter
ValueCountFrequency (%)
ª1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin14158801
80.4%
Common3441646
 
19.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1593877
 
11.3%
e1283545
 
9.1%
n1063343
 
7.5%
i1048544
 
7.4%
r996016
 
7.0%
o852330
 
6.0%
l708858
 
5.0%
s533575
 
3.8%
t502791
 
3.6%
h425908
 
3.0%
Other values (98)5150014
36.4%
Common
ValueCountFrequency (%)
2293981
66.7%
,1069507
31.1%
-35668
 
1.0%
.30041
 
0.9%
'12254
 
0.4%
&30
 
< 0.1%
028
 
< 0.1%
522
 
< 0.1%
121
 
< 0.1%
!16
 
< 0.1%
Other values (13)78
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII17492911
99.4%
None107535
 
0.6%
Punctuation1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2293981
 
13.1%
a1593877
 
9.1%
e1283545
 
7.3%
,1069507
 
6.1%
n1063343
 
6.1%
i1048544
 
6.0%
r996016
 
5.7%
o852330
 
4.9%
l708858
 
4.1%
s533575
 
3.1%
Other values (64)6049335
34.6%
None
ValueCountFrequency (%)
é23755
22.1%
á15596
14.5%
í9129
 
8.5%
ô9087
 
8.5%
ü6859
 
6.4%
ó6500
 
6.0%
ö5900
 
5.5%
ç3944
 
3.7%
è3089
 
2.9%
ä2528
 
2.4%
Other values (46)21148
19.7%
Punctuation
ValueCountFrequency (%)
1
100.0%

description
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct83611
Distinct (%)99.8%
Missing2115
Missing (%)2.5%
Memory size670.9 KiB
The story of
 
15
Mail
 
6
The true story of
 
5
In this sequel to
 
5
Based on
 
5
Other values (83606)
83704 

Length

Max length402
Median length336
Mean length160.0632314
Min length2

Characters and Unicode

Total characters13403695
Distinct characters165
Distinct categories20 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique83527 ?
Unique (%)99.7%

Sample

1st rowThe adventures of a female reporter in the 1890s.
2nd rowTrue story of notorious Australian outlaw Ned Kelly (1855-80).
3rd rowTwo men of high rank are both wooing the beautiful and famous equestrian acrobat Stella. While Stella ignores the jeweler Hirsch, she accepts Count von Waldberg's offer to follow her home, ...
4th rowThe fabled queen of Egypt's affair with Roman general Marc Antony is ultimately disastrous for both of them.
5th rowLoosely adapted from Dante's Divine Comedy and inspired by the illustrations of Gustav Doré the original silent film has been restored and has a new score by Tangerine Dream.

Common Values

ValueCountFrequency (%)
The story of15
 
< 0.1%
Mail6
 
< 0.1%
The true story of5
 
< 0.1%
In this sequel to5
 
< 0.1%
Based on5
 
< 0.1%
Emil goes to Berlin to see his grandmother with a large amount of money and is offered sweets by a strange man that make him sleep. He wakes up at his stop with no money. It is up to him and a group of children to save the day.4
 
< 0.1%
Tom Sawyer and his pal Huckleberry Finn have great adventures on the Mississippi River, pretending to be pirates, attending their own funeral and witnessing a murder.4
 
< 0.1%
Desperate measures are taken by a man who tries to save his family from the dark side of the law, after they commit an unexpected crime.4
 
< 0.1%
During World War II, a teenage Jewish girl named Anne Frank and her family are forced into hiding in the Nazi-occupied Netherlands.4
 
< 0.1%
After she loses her mobile phone, a lawyer receives a call from the person who found it. They talk and hit it off very quickly. But she's in shock when she sees that he's very short.3
 
< 0.1%
Other values (83601)83685
97.5%
(Missing)2115
 
2.5%

Length

2022-10-04T10:35:33.024558image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
a130832
 
5.6%
the111636
 
4.8%
to71782
 
3.1%
of65803
 
2.8%
and61890
 
2.7%
in51753
 
2.2%
his38317
 
1.6%
is35964
 
1.5%
with23353
 
1.0%
her22794
 
1.0%
Other values (84674)1715088
73.6%

Most occurring characters

ValueCountFrequency (%)
2245469
16.8%
e1256194
 
9.4%
a893557
 
6.7%
t845626
 
6.3%
i800526
 
6.0%
o777423
 
5.8%
n767479
 
5.7%
r713250
 
5.3%
s712075
 
5.3%
h546308
 
4.1%
Other values (155)3845788
28.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10396199
77.6%
Space Separator2245476
 
16.8%
Uppercase Letter342324
 
2.6%
Other Punctuation336402
 
2.5%
Decimal Number39039
 
0.3%
Dash Punctuation30712
 
0.2%
Open Punctuation6804
 
0.1%
Close Punctuation6352
 
< 0.1%
Currency Symbol285
 
< 0.1%
Math Symbol29
 
< 0.1%
Other values (10)73
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1256194
12.1%
a893557
 
8.6%
t845626
 
8.1%
i800526
 
7.7%
o777423
 
7.5%
n767479
 
7.4%
r713250
 
6.9%
s712075
 
6.8%
h546308
 
5.3%
l440323
 
4.2%
Other values (47)2643438
25.4%
Uppercase Letter
ValueCountFrequency (%)
A55266
16.1%
T32942
 
9.6%
S26411
 
7.7%
M18411
 
5.4%
C18018
 
5.3%
B17744
 
5.2%
I17021
 
5.0%
H16227
 
4.7%
W15802
 
4.6%
D11908
 
3.5%
Other values (35)112574
32.9%
Other Punctuation
ValueCountFrequency (%)
.185255
55.1%
,108716
32.3%
'26768
 
8.0%
"8711
 
2.6%
:2185
 
0.6%
?1882
 
0.6%
;1266
 
0.4%
/678
 
0.2%
!542
 
0.2%
&303
 
0.1%
Other values (9)96
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
19488
24.3%
06927
17.7%
95749
14.7%
23765
 
9.6%
82400
 
6.1%
32333
 
6.0%
52305
 
5.9%
42241
 
5.7%
71932
 
4.9%
61899
 
4.9%
Math Symbol
ValueCountFrequency (%)
+16
55.2%
~7
24.1%
=4
 
13.8%
¬1
 
3.4%
±1
 
3.4%
Modifier Symbol
ValueCountFrequency (%)
`8
53.3%
^2
 
13.3%
¸2
 
13.3%
¨2
 
13.3%
´1
 
6.7%
Other Symbol
ValueCountFrequency (%)
©6
46.2%
°4
30.8%
¦1
 
7.7%
®1
 
7.7%
1
 
7.7%
Currency Symbol
ValueCountFrequency (%)
$270
94.7%
£13
 
4.6%
¢2
 
0.7%
Space Separator
ValueCountFrequency (%)
2245469
> 99.9%
 7
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
(6784
99.7%
[20
 
0.3%
Close Punctuation
ValueCountFrequency (%)
)6334
99.7%
]18
 
0.3%
Final Punctuation
ValueCountFrequency (%)
»13
86.7%
2
 
13.3%
Dash Punctuation
ValueCountFrequency (%)
-30712
100.0%
Initial Punctuation
ValueCountFrequency (%)
«16
100.0%
Connector Punctuation
ValueCountFrequency (%)
_8
100.0%
Format
ValueCountFrequency (%)
­2
100.0%
Other Letter
ValueCountFrequency (%)
ª1
100.0%
Control
ValueCountFrequency (%)
1
100.0%
Other Number
ValueCountFrequency (%)
³1
100.0%
Nonspacing Mark
ValueCountFrequency (%)
۪1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin10738524
80.1%
Common2665170
 
19.9%
Arabic1
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1256194
11.7%
a893557
 
8.3%
t845626
 
7.9%
i800526
 
7.5%
o777423
 
7.2%
n767479
 
7.1%
r713250
 
6.6%
s712075
 
6.6%
h546308
 
5.1%
l440323
 
4.1%
Other values (93)2985763
27.8%
Common
ValueCountFrequency (%)
2245469
84.3%
.185255
 
7.0%
,108716
 
4.1%
-30712
 
1.2%
'26768
 
1.0%
19488
 
0.4%
"8711
 
0.3%
06927
 
0.3%
(6784
 
0.3%
)6334
 
0.2%
Other values (51)30006
 
1.1%
Arabic
ValueCountFrequency (%)
۪1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII13400163
> 99.9%
None3528
 
< 0.1%
Punctuation2
 
< 0.1%
Arabic1
 
< 0.1%
Specials1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2245469
16.8%
e1256194
 
9.4%
a893557
 
6.7%
t845626
 
6.3%
i800526
 
6.0%
o777423
 
5.8%
n767479
 
5.7%
r713250
 
5.3%
s712075
 
5.3%
h546308
 
4.1%
Other values (80)3842256
28.7%
None
ValueCountFrequency (%)
é1368
38.8%
á317
 
9.0%
í206
 
5.8%
ü165
 
4.7%
ö149
 
4.2%
ó142
 
4.0%
è140
 
4.0%
ç109
 
3.1%
ä103
 
2.9%
ã96
 
2.7%
Other values (62)733
20.8%
Punctuation
ValueCountFrequency (%)
2
100.0%
Arabic
ValueCountFrequency (%)
۪1
100.0%
Specials
ValueCountFrequency (%)
1
100.0%

avg_vote
Real number (ℝ≥0)

HIGH CORRELATION

Distinct89
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.898655873
Minimum1
Maximum9.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size670.9 KiB
2022-10-04T10:35:33.247294image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3.5
Q15.2
median6.1
Q36.8
95-th percentile7.6
Maximum9.9
Range8.9
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation1.234987351
Coefficient of variation (CV)0.2093675878
Kurtosis0.5978266621
Mean5.898655873
Median Absolute Deviation (MAD)0.7
Skewness-0.7609643007
Sum506429.1
Variance1.525193758
MonotonicityNot monotonic
2022-10-04T10:35:33.412804image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.43407
 
4.0%
6.23347
 
3.9%
6.53335
 
3.9%
6.33318
 
3.9%
6.63195
 
3.7%
6.13139
 
3.7%
6.73085
 
3.6%
6.83073
 
3.6%
62832
 
3.3%
72768
 
3.2%
Other values (79)54356
63.3%
ValueCountFrequency (%)
116
 
< 0.1%
1.120
 
< 0.1%
1.220
 
< 0.1%
1.325
 
< 0.1%
1.426
 
< 0.1%
1.533
< 0.1%
1.649
0.1%
1.745
0.1%
1.870
0.1%
1.968
0.1%
ValueCountFrequency (%)
9.91
 
< 0.1%
9.84
 
< 0.1%
9.73
 
< 0.1%
9.52
 
< 0.1%
9.42
 
< 0.1%
9.37
 
< 0.1%
9.29
 
< 0.1%
9.18
 
< 0.1%
919
< 0.1%
8.928
< 0.1%

votes
Real number (ℝ≥0)

HIGH CORRELATION

Distinct14933
Distinct (%)17.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9493.489605
Minimum99
Maximum2278845
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size670.9 KiB
2022-10-04T10:35:33.557755image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum99
5-th percentile114
Q1205
median484
Q31766.5
95-th percentile33416.2
Maximum2278845
Range2278746
Interquartile range (IQR)1561.5

Descriptive statistics

Standard deviation53574.35954
Coefficient of variation (CV)5.64327363
Kurtosis325.2774404
Mean9493.489605
Median Absolute Deviation (MAD)344
Skewness14.61947943
Sum815063550
Variance2870212000
MonotonicityNot monotonic
2022-10-04T10:35:33.722678image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
102316
 
0.4%
101315
 
0.4%
105309
 
0.4%
100308
 
0.4%
106295
 
0.3%
112292
 
0.3%
111288
 
0.3%
107285
 
0.3%
113285
 
0.3%
110282
 
0.3%
Other values (14923)82880
96.5%
ValueCountFrequency (%)
995
 
< 0.1%
100308
0.4%
101315
0.4%
102316
0.4%
103276
0.3%
104268
0.3%
105309
0.4%
106295
0.3%
107285
0.3%
108275
0.3%
ValueCountFrequency (%)
22788451
< 0.1%
22416151
< 0.1%
20028161
< 0.1%
18074401
< 0.1%
17801471
< 0.1%
17554901
< 0.1%
16323151
< 0.1%
16199201
< 0.1%
16042801
< 0.1%
15726741
< 0.1%

budget
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct2506
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29193851.35
Minimum0
Maximum3.5 × 1011
Zeros62179
Zeros (%)72.4%
Negative0
Negative (%)0.0%
Memory size670.9 KiB
2022-10-04T10:35:33.941661image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3100000
95-th percentile24000000
Maximum3.5 × 1011
Range3.5 × 1011
Interquartile range (IQR)100000

Descriptive statistics

Standard deviation1456806506
Coefficient of variation (CV)49.90114147
Kurtosis39661.9928
Mean29193851.35
Median Absolute Deviation (MAD)0
Skewness176.1867109
Sum2.506438108 × 1012
Variance2.122285197 × 1018
MonotonicityNot monotonic
2022-10-04T10:35:34.086586image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
062179
72.4%
10000001004
 
1.2%
2000000765
 
0.9%
3000000698
 
0.8%
5000000655
 
0.8%
10000000559
 
0.7%
500000525
 
0.6%
1500000498
 
0.6%
4000000472
 
0.5%
20000000454
 
0.5%
Other values (2496)18046
 
21.0%
ValueCountFrequency (%)
062179
72.4%
110
 
< 0.1%
26
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%
53
 
< 0.1%
61
 
< 0.1%
72
 
< 0.1%
105
 
< 0.1%
111
 
< 0.1%
ValueCountFrequency (%)
3.5 × 10111
 
< 0.1%
1.2 × 10111
 
< 0.1%
8 × 10101
 
< 0.1%
7 × 10101
 
< 0.1%
6.62168 × 10101
 
< 0.1%
5.9 × 10101
 
< 0.1%
5.5 × 10101
 
< 0.1%
5 × 10102
 
< 0.1%
3.5 × 10103
< 0.1%
3 × 10105
< 0.1%

usa_gross_income
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct14858
Distinct (%)17.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3479774.818
Minimum0
Maximum936662225
Zeros70529
Zeros (%)82.1%
Negative0
Negative (%)0.0%
Memory size670.9 KiB
2022-10-04T10:35:34.270893image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile13540658.5
Maximum936662225
Range936662225
Interquartile range (IQR)0

Descriptive statistics

Standard deviation21706663.98
Coefficient of variation (CV)6.237950764
Kurtosis276.9293875
Mean3479774.818
Median Absolute Deviation (MAD)0
Skewness13.36640586
Sum2.98756067 × 1011
Variance4.711792612 × 1014
MonotonicityNot monotonic
2022-10-04T10:35:34.447694image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
070529
82.1%
100000019
 
< 0.1%
150000017
 
< 0.1%
814417
 
< 0.1%
50913
 
< 0.1%
140000013
 
< 0.1%
200000012
 
< 0.1%
4680811
 
< 0.1%
327000011
 
< 0.1%
130000011
 
< 0.1%
Other values (14848)15202
 
17.7%
ValueCountFrequency (%)
070529
82.1%
301
 
< 0.1%
641
 
< 0.1%
721
 
< 0.1%
741
 
< 0.1%
781
 
< 0.1%
801
 
< 0.1%
951
 
< 0.1%
1201
 
< 0.1%
1471
 
< 0.1%
ValueCountFrequency (%)
9366622251
< 0.1%
8583730001
< 0.1%
7605076251
< 0.1%
7004265661
< 0.1%
6788154821
< 0.1%
6593639441
< 0.1%
6522706251
< 0.1%
6233579101
< 0.1%
6201813821
< 0.1%
6085817441
< 0.1%

worlwide_gross_income
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct30411
Distinct (%)35.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8339442.376
Minimum0
Maximum2797800564
Zeros54839
Zeros (%)63.9%
Negative0
Negative (%)0.0%
Memory size670.9 KiB
2022-10-04T10:35:34.602611image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3211466
95-th percentile26266775.2
Maximum2797800564
Range2797800564
Interquartile range (IQR)211466

Descriptive statistics

Standard deviation55319615.66
Coefficient of variation (CV)6.633490966
Kurtosis408.2062341
Mean8339442.376
Median Absolute Deviation (MAD)0
Skewness16.02837166
Sum7.159828252 × 1011
Variance3.060259877 × 1015
MonotonicityNot monotonic
2022-10-04T10:35:34.996005image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
054839
63.9%
814415
 
< 0.1%
4680810
 
< 0.1%
5099
 
< 0.1%
971826
 
< 0.1%
140000005
 
< 0.1%
28745
 
< 0.1%
110000004
 
< 0.1%
15000004
 
< 0.1%
2200000004
 
< 0.1%
Other values (30401)30954
36.1%
ValueCountFrequency (%)
054839
63.9%
11
 
< 0.1%
161
 
< 0.1%
172
 
< 0.1%
201
 
< 0.1%
231
 
< 0.1%
241
 
< 0.1%
251
 
< 0.1%
301
 
< 0.1%
321
 
< 0.1%
ValueCountFrequency (%)
27978005641
< 0.1%
27904390921
< 0.1%
21951698691
< 0.1%
20682240361
< 0.1%
20483597541
< 0.1%
16704014441
< 0.1%
16569637901
< 0.1%
15188142061
< 0.1%
15150481511
< 0.1%
14500269331
< 0.1%

metascore
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct99
Distinct (%)0.7%
Missing72550
Missing (%)84.5%
Infinite0
Infinite (%)0.0%
Mean55.89688087
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size670.9 KiB
2022-10-04T10:35:35.156470image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile26
Q143
median57
Q369
95-th percentile84
Maximum100
Range99
Interquartile range (IQR)26

Descriptive statistics

Standard deviation17.78487427
Coefficient of variation (CV)0.3181729284
Kurtosis-0.4316876445
Mean55.89688087
Median Absolute Deviation (MAD)13
Skewness-0.1614852504
Sum743708
Variance316.3017529
MonotonicityNot monotonic
2022-10-04T10:35:35.322297image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
64303
 
0.4%
55296
 
0.3%
65291
 
0.3%
57291
 
0.3%
61284
 
0.3%
62281
 
0.3%
49278
 
0.3%
66275
 
0.3%
68273
 
0.3%
58273
 
0.3%
Other values (89)10460
 
12.2%
(Missing)72550
84.5%
ValueCountFrequency (%)
17
< 0.1%
32
 
< 0.1%
41
 
< 0.1%
54
 
< 0.1%
63
 
< 0.1%
78
< 0.1%
88
< 0.1%
915
< 0.1%
1012
< 0.1%
1116
< 0.1%
ValueCountFrequency (%)
10016
 
< 0.1%
998
 
< 0.1%
989
 
< 0.1%
9714
 
< 0.1%
9627
< 0.1%
9515
 
< 0.1%
9427
< 0.1%
9327
< 0.1%
9225
< 0.1%
9142
< 0.1%

reviews_from_users
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct1213
Distinct (%)1.6%
Missing7597
Missing (%)8.8%
Infinite0
Infinite (%)0.0%
Mean46.0408265
Minimum1
Maximum10472
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size670.9 KiB
2022-10-04T10:35:35.504649image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q14
median9
Q327
95-th percentile186
Maximum10472
Range10471
Interquartile range (IQR)23

Descriptive statistics

Standard deviation178.5114112
Coefficient of variation (CV)3.877241673
Kurtosis581.6803158
Mean46.0408265
Median Absolute Deviation (MAD)7
Skewness17.71999159
Sum3603063
Variance31866.32391
MonotonicityNot monotonic
2022-10-04T10:35:35.674560image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17546
 
8.8%
26559
 
7.6%
35373
 
6.3%
44581
 
5.3%
53929
 
4.6%
63457
 
4.0%
73045
 
3.5%
82665
 
3.1%
92483
 
2.9%
102177
 
2.5%
Other values (1203)36443
42.4%
(Missing)7597
 
8.8%
ValueCountFrequency (%)
17546
8.8%
26559
7.6%
35373
6.3%
44581
5.3%
53929
4.6%
63457
4.0%
73045
3.5%
82665
 
3.1%
92483
 
2.9%
102177
 
2.5%
ValueCountFrequency (%)
104721
< 0.1%
88691
< 0.1%
82321
< 0.1%
76391
< 0.1%
75531
< 0.1%
72071
< 0.1%
69381
< 0.1%
67181
< 0.1%
53921
< 0.1%
52611
< 0.1%

reviews_from_critics
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct595
Distinct (%)0.8%
Missing11797
Missing (%)13.7%
Infinite0
Infinite (%)0.0%
Mean27.47998866
Minimum1
Maximum999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size670.9 KiB
2022-10-04T10:35:35.834506image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median8
Q323
95-th percentile126
Maximum999
Range998
Interquartile range (IQR)20

Descriptive statistics

Standard deviation58.3391584
Coefficient of variation (CV)2.122968795
Kurtosis34.73872691
Mean27.47998866
Median Absolute Deviation (MAD)6
Skewness5.028834999
Sum2035113
Variance3403.457402
MonotonicityNot monotonic
2022-10-04T10:35:35.988001image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18506
 
9.9%
26822
 
7.9%
35437
 
6.3%
44722
 
5.5%
53884
 
4.5%
63215
 
3.7%
72774
 
3.2%
82451
 
2.9%
92168
 
2.5%
101941
 
2.3%
Other values (585)32138
37.4%
(Missing)11797
 
13.7%
ValueCountFrequency (%)
18506
9.9%
26822
7.9%
35437
6.3%
44722
5.5%
53884
4.5%
63215
 
3.7%
72774
 
3.2%
82451
 
2.9%
92168
 
2.5%
101941
 
2.3%
ValueCountFrequency (%)
9991
< 0.1%
9091
< 0.1%
8381
< 0.1%
8331
< 0.1%
8301
< 0.1%
8131
< 0.1%
7821
< 0.1%
7691
< 0.1%
7551
< 0.1%
7401
< 0.1%

Interactions

2022-10-04T10:35:25.312942image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:12.192133image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:14.080651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:15.587011image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:17.813092image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:19.207791image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:20.635494image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:22.469877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:23.899379image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:25.658003image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:12.495480image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:14.281533image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:15.742798image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:17.980648image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:19.366791image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:20.851870image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:22.695518image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:24.058977image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:25.841373image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:12.692278image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:14.447134image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:15.891584image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:18.130651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:19.508752image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:21.023028image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:22.850525image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:24.201375image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:25.996878image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:12.880693image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:14.624451image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:16.045605image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:18.287334image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:19.652731image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:21.243744image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:22.986953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:24.352372image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:26.149979image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:13.079416image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:14.804785image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:16.202581image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:18.453237image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:19.807689image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:21.453045image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:23.125890image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:24.506957image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:26.300565image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:13.288070image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:14.962785image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:16.353559image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:18.609682image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:19.948033image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:21.695181image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:23.269792image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:24.690508image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:26.435992image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:13.480351image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:15.131508image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:16.489337image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:18.754675image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:20.087014image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:21.882716image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:23.402928image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:24.853210image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:26.604842image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:13.732581image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:15.290054image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:17.481304image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:18.897567image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:20.236267image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:22.083915image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:23.563017image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:25.011858image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:26.803312image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:13.900959image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:15.437077image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:17.652652image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:19.056810image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:20.429211image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:22.297607image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:23.736210image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-04T10:35:25.162824image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-10-04T10:35:36.112936image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-04T10:35:36.360554image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-04T10:35:36.660211image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-04T10:35:36.886608image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-04T10:35:27.516755image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-04T10:35:28.387704image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-10-04T10:35:29.355633image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-10-04T10:35:29.812926image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

imdb_title_idtitleoriginal_titleyeardate_publishedgenredurationcountrylanguagedirectorwriterproduction_companyactorsdescriptionavg_votevotesbudgetusa_gross_incomeworlwide_gross_incomemetascorereviews_from_usersreviews_from_critics
0tt0000009Miss JerryMiss Jerry18941894-10-09Romance45USANoneAlexander BlackAlexander BlackAlexander Black PhotoplaysBlanche Bayliss, William Courtenay, Chauncey DepewThe adventures of a female reporter in the 1890s.5.9154000NaN1.02.0
1tt0000574The Story of the Kelly GangThe Story of the Kelly Gang19061906-12-26Biography, Crime, Drama70AustraliaNoneCharles TaitCharles TaitJ. and N. TaitElizabeth Tait, John Tait, Norman Campbell, Bella Cola, Will Coyne, Sam Crewes, Jack Ennis, John Forde, Vera Linden, Mr. Marshall, Mr. McKenzie, Frank Mills, Ollie WilsonTrue story of notorious Australian outlaw Ned Kelly (1855-80).6.1589225000NaN7.07.0
2tt0001892Den sorte drømDen sorte drøm19111911-08-19Drama53Germany, DenmarkNoneUrban GadUrban Gad, Gebhard Schätzler-PerasiniFotoramaAsta Nielsen, Valdemar Psilander, Gunnar Helsengreen, Emil Albes, Hugo Flink, Mary HagenTwo men of high rank are both wooing the beautiful and famous equestrian acrobat Stella. While Stella ignores the jeweler Hirsch, she accepts Count von Waldberg's offer to follow her home, ...5.8188000NaN5.02.0
3tt0002101CleopatraCleopatra19121912-11-13Drama, History100USAEnglishCharles L. GaskillVictorien SardouHelen Gardner Picture PlayersHelen Gardner, Pearl Sindelar, Miss Fielding, Miss Robson, Helene Costello, Charles Sindelar, Mr. Howard, James R. Waite, Mr. Osborne, Harry Knowles, Mr. Paul, Mr. Brady, Mr. CorkerThe fabled queen of Egypt's affair with Roman general Marc Antony is ultimately disastrous for both of them.5.24464500000NaN25.03.0
4tt0002130L'InfernoL'Inferno19111911-03-06Adventure, Drama, Fantasy68ItalyItalianFrancesco Bertolini, Adolfo PadovanDante AlighieriMilano FilmSalvatore Papa, Arturo Pirovano, Giuseppe de Liguoro, Pier Delle Vigne, Augusto Milla, Attilio Motta, Emilise BerettaLoosely adapted from Dante's Divine Comedy and inspired by the illustrations of Gustav Doré the original silent film has been restored and has a new score by Tangerine Dream.7.02237000NaN31.014.0
5tt0002199From the Manger to the Cross; or, Jesus of NazarethFrom the Manger to the Cross; or, Jesus of Nazareth19121913Biography, Drama60USAEnglishSidney OlcottGene GauntierKalem CompanyR. Henderson Bland, Percy Dyer, Gene Gauntier, Alice Hollister, Samuel Morgan, James D. Ainsley, Robert G. Vignola, George Kellog, J.P. McGowanAn account of the life of Jesus Christ, based on the books of the New Testament: After Jesus' birth is foretold to his parents, he is born in Bethlehem, and is visited by shepherds and wise...5.7484000NaN13.05.0
6tt0002423Madame DuBarryMadame DuBarry19191919-11-26Biography, Drama, Romance85GermanyGermanErnst LubitschNorbert Falk, Hanns KrälyProjektions-AG Union (PAGU)Pola Negri, Emil Jannings, Harry Liedtke, Eduard von Winterstein, Reinhold Schünzel, Else Berna, Fred Immler, Gustav Czimeg, Karl Platen, Bernhard Goetzke, Magnus Stifter, Paul Biensfeldt, Willy Kaiser-Heyl, Alexander Ekert, Robert Sortsch-PlaThe story of Madame DuBarry, the mistress of Louis XV of France, and her loves in the time of the French revolution.6.8753000NaN12.09.0
7tt0002445Quo Vadis?Quo Vadis?19131913-03-01Drama, History120ItalyItalianEnrico GuazzoniHenryk Sienkiewicz, Enrico GuazzoniSocietà Italiana CinesAmleto Novelli, Gustavo Serena, Carlo Cattaneo, Amelia Cattaneo, Lea Giunchi, Bruto Castellani, Augusto Mastripietri, Cesare Moltini, Olga Brandini, Ignazio Lupi, Giovanni Gizzi, Lia Orlandini, Matilde Guillaume, Ida Carloni Talli, Giuseppe GambardellaAn epic Italian film "Quo Vadis" influenced many of the later movies.6.22734500000NaN7.05.0
8tt0002452Independenta RomanieiIndependenta Romaniei19121912-09-01History, War120RomaniaNoneAristide Demetriade, Grigore BrezeanuAristide Demetriade, Petre LiciuSocietatea Filmului de Arta Leon PopescuAristide Demetriade, Constanta Demetriade, Constantin Nottara, Pepi Machauer, Aurel Athanasescu, Jeny Metaxa-Doro, Nicolae Soreanu, Vasile Toneanu, Aristita Romanescu, Elvire Popesco, M. Vîrgolici, C. Nedelcovici, Mihail Tancovici-Cosmin, Ion Dumitrescu, Gheorghe MeliseanuThe movie depicts the Romanian War of Independence (1877-1878).6.719840000000NaN4.01.0
9tt0002461Richard IIIRichard III19121912-10-15Drama55France, USAEnglishAndré Calmettes, James KeaneJames Keane, William ShakespeareLe Film d'ArtRobert Gemp, Frederick Warde, Albert Gardner, James Keane, George Moss, Howard Stuart, Virginia Rankin, Violet Stuart, Carey Lee, Carlotta De FeliceRichard of Gloucester uses manipulation and murder to gain the English throne.5.52253000000NaN8.01.0

Last rows

imdb_title_idtitleoriginal_titleyeardate_publishedgenredurationcountrylanguagedirectorwriterproduction_companyactorsdescriptionavg_votevotesbudgetusa_gross_incomeworlwide_gross_incomemetascorereviews_from_usersreviews_from_critics
85845tt9904250La reina de los lagartosLa reina de los lagartos20192019-10-05Fantasy63NoneSpanish, CatalanJuan González, Nando MartínezJuan González, Nando MartínezAquí y Allí FilmsJavier Botet, Bruna Cusí, Miki Esparbé, Ivan LabandaA spaceship is about to come to pick up Javi, so him and Berta have to put an end to their summer love.4.8103000NaNNaN5.0
85846tt9904802Enemy LinesEnemy Lines20202020-05-04War92UKEnglish, Polish, Russian, GermanAnders BankeMichael Wright, Tom GeorgeHappy Hour FilmsEd Westwick, John Hannah, Tom Wisdom, Corey Johnson, Pawel Delag, Gary Grant, Daniel Jillings, Scott Haining, Ekaterina Vladimirova, Vladimir Epifantsev, Kirill Pletnyov, Patrik Karlson, Andrey Karako, Jean-Marc Birkholz, Aleksandr ZlatopolskiyIn the frozen, war torn landscape of occupied Poland during World War II, a crack team of allied commandos are sent on a deadly mission behind enemy lines to extract a rocket scientist from the hands of the Nazis.5.0764000NaN29.06.0
85847tt9905412OttamOttam20192019-03-08Drama120IndiaMalayalamZamRajesh k NarayanThomas Thiruvalla FilmsNandu Anand, Roshan Ullas, Manikandan R. Achari, Alencier Ley Lopez, Kalabhavan Shajohn, Rohini, Madhuri Dilip, Althaf, Sudheer Karamana, Thezni Khan, Rajesh SharmaSet in Trivandrum, the story of Ottam unfolds in a day, and progresses through the lives of two youngsters - Abhi and Vinay. What does destiny have in store for these young men?7.4494400000004791NaN1.0NaN
85848tt9905462PengalilaPengalila20192019-03-08Drama111IndiaMalayalamT.V. ChandranT.V. ChandranBenzy ProductionsLal, Akshara Kishor, Iniya, Narain, Renji Panicker, Indrans, Priyanka NairAn unusual bond between a sixty year old Dalit worker Azhagan and an eight year old middle class girl Radha. Within no time their bond grows stronger. However, his proximity to Radha and her mother doesn't go down well with Radha's father.8.85531000000000NaNNaNNaN
85849tt9906644ManoharamManoharam20192019-09-27Comedy, Drama122IndiaMalayalamAnvar SadikNonechakkalakal FilmsVineeth Sreenivasan, Aparna Das, Basil Joseph, Indrans, Delhi Ganesh, Deepak Parambol, Hareesh Peradi, Nandu, Sreelakshmi, Ahamed Siddique, Nandini Sree, V.K. Prakash, Kalaranjini, Jude Anthany Joseph, Nisthar SaitManoharan is a poster artist struggling to find respect for his profession, after the advent of printing technology. He tries hard to get into the mainstream, by picking up design software skills. Will he succeed?6.8491000NaN9.01.0
85850tt9908390Le lionLe lion20202020-01-29Comedy95France, BelgiumFrenchLudovic Colbeau-JustinAlexandre Coquelle, Matthieu Le NaourMonkey Pack FilmsDany Boon, Philippe Katerine, Anne Serra, Samuel Jouy, Sophie Verbeeck, Carole Brana, Benoît Pétré, Aksel Ustun, Mathieu Lardot, Olivier Sa, Julien Prevost, Antoine Mathieu, David Ban, Stan, Guillaume ClémencinA psychiatric hospital patient pretends to be crazy. In charge of caring for this patient, a caregiver will begin to doubt the mental state of his "protégé".5.3398003507171NaNNaN4.0
85851tt9911196De Beentjes van Sint-HildegardDe Beentjes van Sint-Hildegard20202020-02-13Comedy, Drama103NetherlandsGerman, DutchJohan NijenhuisRadek Bajgar, Herman FinkersJohan Nijenhuis & CoHerman Finkers, Johanna ter Steege, Leonie ter Braak, Stef Assen, Annie Beumers, Jos Brummelhuis, Reinier Bulder, Daphne Bunskoek, Karlijn Koel, Karlijn Lansink, Marieke Lustenhouwer, Jan Roerink, Ferdi Stofmeel, Aniek Stokkers, Belinda van der StoepA middle-aged veterinary surgeon believes his wife pampers him too much. In order to get away from her, he fakes the onset of dementia.7.7724007299062NaN6.04.0
85852tt9911774Padmavyuhathile AbhimanyuPadmavyuhathile Abhimanyu20192019-03-08Drama130IndiaMalayalamVineesh AaradyaVineesh Aaradya, Vineesh AaradyaRMCC ProductionsAnoop Chandran, Indrans, Sona Nair, Simon Britto RodriguesNone7.9265000NaNNaNNaN
85853tt9914286Sokagin ÇocuklariSokagin Çocuklari20192019-03-15Drama, Family98TurkeyTurkishAhmet Faik AkinciAhmet Faik Akinci, Kasim UçkanGizem AjansAhmet Faik Akinci, Belma Mamati, Metin Keçeci, Burhan Sirmabiyik, Orhan Aydin, Tevfik Yapici, Yusuf Eksi, Toygun Ates, Aziz Özuysal, Dilek Ölekli, Arcan Bunial, Seval Hislisoy, Ergül Çolakoglu, Gülçin Ugur, Ibrahim BalabanNone6.4194002833NaNNaNNaN
85854tt9914942La vida sense la Sara AmatLa vida sense la Sara Amat20192020-02-05Drama74SpainCatalanLaura JouCoral Cruz, Pep PuigLa Xarxa de Comunicació LocalMaria Morera Colomer, Biel Rossell Pelfort, Isaac Alcayde, Lluís Altés, Joan Amargós, Pepo Blasco, Cesc Casanovas, Oriol Cervera, Pau Escobar, Jordi Figueras, Arés Fuster, Judit Martín, Martí Múrcia, Mariona Pagès, Francesca PiñónPep, a 13-year-old boy, is in love with a girl from his grandparents village, Sara Amat. One summer night Sara disappears without a trace. After a few hours, Pep finds her hiding in his room.6.71020059794NaNNaN2.0